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Abstract. Randomized algorithms and data structures are often ana¬ 
lyzed under the assumption of access to a perfect source of randomness. 
The most fundamental metric used to measure how “random” a hash 
function or a random number generator is, is its independence : a se¬ 
quence of random variables is said to be fc-independent if every variable 
is uniform and every size k subset is independent. 

In this paper we consider three classic algorithms under limited inde¬ 
pendence. Besides the theoretical interest in removing the unrealistic 
assumption of full independence, the work is motivated by lower inde¬ 
pendence being more practical. We provide new bounds for randomized 
quicksort, min-wise hashing and largest bucket size under limited inde¬ 
pendence. Our results can be summarized as follows. 

— Randomized quicksort. When pivot elements are computed using a 5- 
independent hash function, Karloff and Raghavan, J.ACM’93 showed 
O(nlogn) expected worst-case running time for a special version of 
quicksort. We improve upon this, showing that the same running 
time is achieved with only 4-independence. 

— Min-wise hashing. For a set A, consider the probability of a partic¬ 
ular element being mapped to the smallest hash value. It is known 
that 5-independence implies the optimal probability 0(l/n). Broder 
et al., STOC’98 showed that 2-independence implies it is 0(1/A\). 
We show a matching lower bound as well as new tight bounds for 3- 
and 4-independent hash functions. 

— Largest bucket. We consider the case where n balls are distributed 
to n buckets using a k -independent hash function and analyze the 
largest bucket size. Alon et. al, STOC’97 showed that there exists a 
2-independent hash function implying a bucket of size I2( n 1 ^ 2 ). We 
generalize the bound, providing a fc-independent family of functions 
that imply size ^(n 1 ^). 


* Research partly supported by Mikkel Thorup’s Advanced Grant from the Danish 
Council for Independent Research under the Sapere Aude programme and the FNU 
project AlgoDisc - Discrete Mathematics, Algorithms, and Data Structures. 

** This author is supported by the Danish National Research Foundation under the 
Sapere Aude program. 



1 Introduction 


A unifying metric of strength of hash functions and pseudorandom number gen¬ 
erators is the independence of the function. We say that a sequence of random 
variables is fc-independent if every random variable is uniform and every size 
k subset is independent. A question of theoretical interest is, regarding each 
algorithmic application, how much independence is required?. With the stan¬ 
dard implementation of a random generator or hash function via a k— degree 
polynomial k determines both the space used and the amount of randomness 
provided. A typical assumption when performing algorithmic analysis is to just 
assume full independence, i.e., that for input size n then the hash function is 
ro-independent. Besides the interest from a theoretic perspective, the question of 
how much independence is required is in fact interesting from a practical per¬ 
spective: hash functions and generators with lower independence are as a rule 
of thumb faster in practice than those with higher independence, hence if it 
is proven that the algorithmic application needs only fc-independence to work, 
then it can provide a speedup for an implementation to specifically pick a fast 
construction that provides the required ^-independence. In this paper we con¬ 
sider three fundamental applications of random hashing, where we provide new 
bounds for limited independence. 

Min-wise hashing. We consider the commonly used scheme min-wise hash¬ 
ing , which was first introduced by Broder [2] and has several well-founded ap¬ 
plications (see Section 2). Here we study families of hash functions, where a 
function h is picked uniformly at random from the family and applied to all 
elements of a set A. For any element x £ A we say that h is min-wise in¬ 
dependent if Pr(minh(A) = x) = 1/|A| and e-min-wise if Pr(min/i(A) = 
x) = (1 +£)/|A|. For this problem we show new tight bounds for k = 2,3,4 
of e = 0(^/n),0(log n),0(log n) respectively and for k = 5 it is folklore that 
0(l)-min-wise (e = 0(1)) can be achieved. Since tight bounds for k > 5 exist 
(see Section 2) , our contribution closes the problem. 

Randomized quicksort. Next we consider a classic sorting algorithm pre¬ 
sented in many randomized algorithms books, e.g. already on page three of 
Motwani-Raghavan [12]. The classic analysis of quicksort in Motwani-Raghavan 
uses crucially the probability of a particular element being mapped to the small¬ 
est hash value out of all the elements: the expected worst-case running time in 
this analysis is 0(nlogn • Pr(minh(A) = x)), where A is the set of n elements 
to be sorted and x £ A. It follows directly from our new tight min-wise bounds 
that this analysis cannot be improved further. A special version of randomized 
quicksort was showed by Karloff and Raghavan to use expected worst-case time 
O(nlogn) when the pivot elements are chosen using a 5-independent hash func¬ 
tion [11], Our main result is a new general bound for the number of comparisons 
performed under limited independence, which applies to several settings of quick¬ 
sort, including the setting of Karloff-Raghavan where we show the same running 
time using only 4-independence. Furthermore, we show that k = 2 and k = 3 can 
imply expected worst-case time fi (n log 2 n) . An interesting observation is that 
our new bounds for k = 4 and k = 2 shows that the classic analysis using min- 
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wise hashing is not tight, as we go below those bounds by a factor logn for k = 4 
and a factor yjn/\ogn for k = 2. Our findings imply that a faster 4-independent 
hash function can be used to guarantee the optimal running time for randomized 
quicksort, which could potentially be of practical interest. Interestingly, our new 
bounds on the number of performed comparisons under limited independence 
has implications on classic algorithms for binary planar partitions and treaps. 
For binary planar partitions our results imply expected partition size 0(nlogn) 
for the classic randomized algorithm for computing binary planar partitions [12, 
Page 10] under 4-independence. For randomized treaps [12, Page 201] our new 
results imply O(logn) worst-case depth for 4-independence. 

Larget bucket size. The last setting we consider is throwing n balls into 
n buckets using an fc-independent hash function and analyzing the size of the 
largest bucket. This can be regarded as a load balancing as the balls can represent 
“tasks” and the buckets represent processing units. Our main result is a family 
of fc-independent hash functions, which when used in this setting implies largest 
bucket size 17( n 1 ^ k ) with constant probability. This result was previously known 
only for k = 2 due to Alon et al. [1] and our result is a generalization of their 
bound. As an example of the usefulness of such bucket size bounds, consider the 
fundamental data structure; the dictionary. Widely used algorithms books such 
as Cormen et al. [7] teaches as the standard method to implement a dictionary 
to use an array with chaining. Chaining here simply means that for each key, 
corresponding to an entry in the array, we have a linked list (chain) and when a 
new key-value pair is inserted, it is inserted at the end of the linked list. Clearly 
then, searching for a particular key-value pair takes worst-case time proportional 
to the size of the largest chain. Hence, if one is interested in worst-case lookup 
time guarantees then the expected largest bucket size formed by the keys in the 
dictionary is of great importance. 


2 Relation to previous work 

We will briefly review related work on the topic of bounding the independence 
used as well as mention some of the popular hash function constructions. 

The line of research that considers the amount of independence required is 
substantial. As examples, Pagh et al. [13] showed that linear probing works with 
5-independence. For the case of e-min-wise hashing (“almost” min-wise-hashing 
as used e.g. in [9]) Indyk showed that O (log ^-independence is sufficient. For 
both of the above problems Thorup and Patra§cu [15] showed optimality: They 
show existence of explicit families of hash functions that for linear probing is 4- 
independent leading to 17 (log n) probes and for e-min-wise hashing is f2(log |)- 
independent that implies (2e)-min-wise hashing. Additionally, they show that the 
popular multiply-shift hashing scheme by Dietzfelbinger et al. [8] is not sufficient 
for linear probing and e-min-wise hashing. In terms of lower bounds, it was shown 
by Broder et al.[3] that k = 2 implies Pr(min/j(A) = x) = 1/a/|A|. We provide 
a matching lower bound and new tight bounds for k = 3,4. Additionally we 
review a folklore 0(l/n) upper bound for k = 5. Our lower bound proofs for 
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min-wise hashing (see Table 1) for k = 3,4 are surprisingly similar to those of 
Thorup and Patra§cu for linear probing, in fact we use the same “bad” families 
of hash functions but with a different analysis. Further the same families imply 
the same multiplicative factors relative to the optimal. Our new tight bounds 
together with the bounds for k > 5 due to [9,15] provide the full picture of how 
min-wise hashing behaves under limited independence. 

Randomized quicksort [1! ] is well known to sort n elements in expected time 
0(n\ogn) under full independence. Given that pivot elements are picked by hav¬ 
ing n random variables with outcomes 0,..., n — 1 and the outcome of variable i 
in the sequence determines the itli pivot element, then running time O(nlogn) 
has been shown[1 1 ] for k = 5. We improve this and show 0(n log n) time for k = 4 
in the same setting. To the knowledge of the authors, it is still an open problem 
to analyze the version of randomized quicksort under limited independence as 
presented by e.g. Motwani-Raghavan. The analysis of both the randomized bi¬ 
nary planar partition algorithm and the randomized treap in Motwani-Raghavan 
is done using the exact same argument as for quicksort, namely using min-wise 
hashing which we show cannot be improved further and is not tight. Our new 
quicksort bounds directly translates to improvements for these two applications. 
The randomized binary planar partition algorithm is hence improved to be of 
expected size 0(n log 2 n) for k = 2 and O(nlogn) for k = 4, and the expected 
worst case depth of any node in a randomized treap is improved to be 0(log 2 n) 
for k = 2 and O(logn) for k = 4. 

As briefly mentioned earlier, our largest bucket size result is related to the 
generalization of Alon et al., STOC’97, specifically [1, Theorem 2]. They show 
that for a (perfect square) field F then the class H of all linear transformations 
between F 2 and F has the property that when a hash function is picked uniformly 
at random from h G H then an input set of size n exists so that the largest bucket 
has size at least y/n. In terms of upper bounds for largest bucket size, remember 
that a family 7i u of hash functions that map from IA to [n] is universal [ ] if for 
a h picked uniformly from A-L u it holds 

Wx ^ y GU : Pr (h(x) = h(y )) < 1/n. 

Universal hash functions are known to have expected largest bucket size at most 
y/n + 1/2, hence essentially tight compared to the bound y/n lower bound of 
Alon et al. On the other end of the spectrum, full independence is known to 
give expected largest bucket size @(logn/loglogn) due to a standard applica¬ 
tion of Chernoff bounds. This bound was proven to hold for (9(logn/loglogn)- 
independence as well [16]. In Section 7.1 we additionally review a folklore upper 
bound coinciding with our new lower bound. 

Since the question of how much independence is needed from a practical 
perspective often could be rephrased “how fast a hash function can I use and 
maintain algorithmic guarantees?” we will briefly recap some used hash functions 
and pseudorandom generators. Functions with lower independence are typically 
faster in practice than functions with higher. The formalization of this is due to 
Siegel’s lower bound [17] where he shows that in the cell probe model, to achieve 
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fc-independence and number of probes t < k then you need space fc(n/fc) 1,/4 . 
Since space usage scales with the independence k then for high k the effects 
of the memory hierarchy will mean that even if the time is held constant the 
practical time will scale with k as cache effects etc. impact the running time. 

The most used hashing scheme in practice is, as mentioned, the 2-independent 
multiply-shift by Dietzfelbinger et al. [8], which can be twice as fast [19] com¬ 
pared to even the simplest linear transformation x H > (ax + b) mod p. For 
3-independence we have due to (analysis by) Thorup and Patra§cu the sim¬ 
ple tabulation scheme [ 4], which can be altered to give 5-universality [20]. For 
general fc-independent hash functions the standard solution is degree k — 1 poly¬ 
nomials, however especially for low k these are known to run slowly, e.g. for 
k = 5 then polynomial hashing is 5 times slower than the tabulation based solu¬ 
tion of [20]. Alternatively for high independence the double tabulation scheme 
by Thorup[18], which builds on Siegels result [17], can potentially be practical. 
On smaller universes Thorup gives explicit and practical parameters for 100- 
independence. Also for high independence, the nearly optimal hash function of 
Christiani et al.[6] should be practical. For generating ^-independent variables 
then Christiani and Pagh’s constant time generator [5] performs well - their 
method is at an order of magnitude faster than evaluating a polynomial using 
fast fourier transform. We note that even though constant time generators as the 
above exist, the practical evaluation will actually scale with the independence, 
as the memory usage of the methods depend on the independence and so the 
effects of the underlying memory hierarchy comes to effect. 

Finally, we would like to note that the paradigm of independence has its 
limitations in the sense that even though one can prove that fc-independence by 
itself does not imply certain algorithmic guarantees, it can not be ruled out that 
fc-independent hash functions exist that do. That is, lower bound proofs typi¬ 
cally construct artificial families to provide counter examples, which in practice 
would not come into play. As an example, consider that linear probing needs 5- 
independence to work as mentioned above but it has been proven to work with 
simple tabulation hashing [14], which only has 3-independence. 


3 Our results 

With regard to min-wise hashing, we close this version of the problem by pro¬ 
viding new and tight bounds for k = 2,3,4. We consider the following setting: 
let A bet a set of size n and let H be a /c-independent family of hash functions. 
We examine the probability of any element x £ A receiving the smallest hash 
value h(x ) out of all elements in A when h £ % is picked uniformly at random. 
For the case of k = 2, 3,4-independent families we provide new bounds as shown 
in Table 1, which provides a full understanding of the parameter space as a tight 
bound of Pr(min/i(A) = x) = 0(l/n) is known for k > 5 due to Indyk[9[. We 
make note that our lower bound proofs, which work by providing explicit “bad” 
families of functions, share similarity with Thorup and Patra§cu’s [15, Table 1] 
proof of linear probing. In fact, our bad families of functions used are exactly the 
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k = 2 k = 3 fc = 4 fc > 5 

Upper bound 

Lower bound 

0(y/h/n) O((logri)/n)* 0((\ogn)/n)* C(l/n) 
fl(\fn/n)* f2((logn)/n)* i?((logn)/n)* 0(1/n) 


Table 1. Result overview for min-wise hashing. Results in this paper are marked 
with *. For a set A of size n and an element x £ A the cells correspond the prob¬ 
ability Pr(minfc(A) = x) for a hash function h picked uniformly at random from a 
k -independent family 'H. 


same, while the analysis is different. Surprisingly, the constructions imply the 
same factor relative to optimal as in linear probing, for every examined value of 

k. 

Next, we consider randomized quicksort under limited independence. In the 
same setting as Karloff and Raghavan [11] our main result is that 4-independence 
is sufficient for the optimal 0(n\ogn) expected worst-case running time. The 
setting is essentially that pivot elements are picked from a sequence of fc- 
independent random variables that are pre-computed. Our results apply to a 
related setting of quicksort as well as to the analysis of binary planar partitions 
and randomized treaps. Our results are summarized in Table 2. 



k = 2 

CO 

II 

k = 4 

k > 5 

Upper bound 
Lower bound 

0(n log^ n 
f2(nlogn) 

* 0(n log^n)* 
!?(nlog n) 

0(n logn)* 
Q(n log n ) 

O(nlogn) 

L?(nlogn) 


Table 2. Result overview for randomized quicksort. Results in this paper are marked 
with *. When our hash function h is picked uniformly from fc-independent family H 
then the cells in the table denote the expected running time to sort n distinct elements. 
The 5-independent upper bound is from Karloff-Raghavan[ ]. 


Finally for the fundamental case of throwing n balls into n buckets. The main 
result is a simple fc-independent family of functions which when used to throw 
the balls imply that with constant probability the largest bucket has 
balls. We show the theorem below. 

Theorem 1. Consider the setting where n balls are distributed among n buckets 
using a random hash function h. For m < n and any k £ N such that k < n 1//fc 
and m k > n a k-independent distribution over hash functions exists such that the 
largest bucket size is f2(m) with probability fi (-^) when h is chosen according 
to this distribution. 

An implication of Theorem 1 is that we now have the full understanding of 
the parameter space for this problem, as it was well known that independence 
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k = O(logn/loglogn) implied (9(logra/loglogn) balls in the largest bucket. We 
summarize with the corollary below. 

Corollary 1 . Consider the setting where n balls are distributed among n buckets 
using a random hash function h. Given an integer k a distribution over hash 
functions exists such that if h is chosen according to this distribution then with 
L being the size of the largest bucket 

(a) if k < n x / k then L = ft ( n 1//fe ) with probability 1?(1). 

(b) if k > n l t k then L = ft (log n/ log log n) with probability ft( 1). 

We note that the result of Theorem 1 is not quite the generalization of the lower 
bound of Alon et al. since they show ^(n 1 / 2 ) largest bucket size for any linear 
transformation while our result provides an explicit worst-case fc-independent 
scheme to achieve largest bucket size ft(n 1 / k ). However, as is evident from the 
proof in the next section, our scheme is not that artificial: In fact it is “nearly” 
standard polynomial hashing, providing hope that the true generalization of 
Alon et al. can be shown. 


4 Preliminaries 

We will introduce some notation and fundamentals used in the paper. For an 
integer n we let [n] denote {0, ... ,n — 1}. For an event E we let [E] be the 
variable that is 1 if if occurs and 0 otherwise. Unless explicitly stated otherwise, 
the log n refers to the base 2 logarithm of n. For a real number x and a non¬ 
negative integer k we define x— as x(x — 1).. .{x — (k — 1)). 

The paper is about application bounds when the independence of the random 
variables used is limited. We define independence of a hash function formally 
below. 

Definition 1. Let h : IA i—>■ V be a random hash function, k € N and let 
u±,... ,Uk be any distinct k elements from IA and V\,... ,Vk be any k elements 
from V. 

Then h is k-independent if it holds that 

Pr (h(ui) = vi A ... A h(uk) = v k ) = -^—r. 

\v r 

Note that an equivalent definition for a sequence of random variables hold: they 
are fc-independent if any element is uniformly distributed and every fc-tuple of 
them is independent. 


5 Min-wise hashing 

In this section we show the bounds that can be seen in Table 1. As mentioned 
earlier, there is a close relationship between the worst case query time of an 
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element in linear probing and min-wise hashing when analysed under the as¬ 
sumption of hash functions with limited independence. Intuitively, long query 
time for linear probing is caused by many hash values being “close” to the hash 
value of the query element. On the other hand a hash value is likely to be the 
minimum if it is “far away” from the other hash values. So intuitvely, min-wise 
hashing and linear probing are related by the fact that good guarantees require 
a “sharp” concentration on how close to the hash value of the query element the 
other hash values are. 


5.1 Upper bounds 


We show the following theorem which results in the upper bounds shown in 
Table 1. Note that the bound for 4-independence follows trivially from the bound 
for 3-independence and that the 5-independence bound is folklore but included 
for completeness. 


Theorem 2. Let X = { xq,Xi , ■ ■ ■ ,x n } and h : X 
If h is 3- independent 


Pr ( h(x o) < min {h(xi)} ) = O 

*£{!,...,n} 


-+ (0,1) be a hash function. 
/log(nj-l)' 

V n + 1 


If h is 5- independent 



Pr ( h(x 0 ) < min {h(xi)} ) 
V *e{i,...,n} ) 

= °[ 

' 1 ^ 




yn+lj 



For 

notational convenience let 

E 

denote 

the 

event 


(h(xg) < min ie / 1 {h(xi)}). First assume that h is 3-independent. Fix 
h(X q) = a £ (0,1). Then h is 2-independent on the remaining keys. Let 
Z = X)"=i [Mu) < a]. Then under the assumption h(xo) = a: 

Pr ( E | h(xo) = a) = Pr (Z = 0 | h(xo) = a) < Pr (| Z — EZ\ > E Z \ h(x o) = a) 

Now since h is 2-independent on the remaining keys we see that 
Pr (E | h(xo) = a) is upper bounded by (using Fact 6): 


Pr (|Z - EZ\ > EZ | h(x 0 ) = a) < 


E 


(( Z-EZ ) 2 ) 


= o 


(EZ) 
1 


= o — 


1 

EZ 


na 


(i) 


Hence: 


Pr (E | h(x o) = a) = / Pr (E \ h( xq) = a) da 

Jo 


i o 
1 

< - 
n 


' 1/r 


o — \=o 


( log(» + 1) 

V 71+1 


( 2 ) 
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This proves the first part of the theorem. Now assume that h is 5-independent 
and define Z in the same way as before. In the same manner as we established the 
upper bound for Pr ( E \ h{x o) = a) in (1) we see that it is now upper bounded 
by (using Fact 6): 


e((Z-EZ ) 4 ) 

Pr (|Z - EZ\ > EZ | h(x o) = a) < V 4 ’ 

"°((E Zf) 

In the same manner as in (2) we now see that 

Pr (E)= f 1 Pr (E \ h(x 0 ) = a) da < — + o(j^) =o(^—) 

Jo n J l/n \(na.y J \n + lj 


5.2 Lower bounds 

We first show the k = 4 lower bound seen in Table 1. As mentioned earlier, the 
argument follows from the same “bad” distrubition as Thorup and Patra§cu[15], 
but with a different analysis. 

Theorem 3. For any key set X = {xo^i) • • • ,%n} there exists a random hash 
function h : X —> (0,1) that is 4-independent such that 


Pr (h(x 0 ) < min {h(x i),..., h(x n )}) = 1? ^ (3) 

Proof. We consider the strategy from Thorup and Patra§cu [15, Section 2.3] 
where we hash X into [£], where t power of 2 such that t = 0{n). We use the 
strategy to determine the first log t bits of the values of h and let the remaining 
bits be chosen independently and uniformly at random. The strategy ensures that 
for every l e [| logf, | logi] with probability <9(2^/n) there exists an interval I 
of size 0(2~ e ) such that h(x o) is uniformly distributed in I and / contains at 
least ■ (1 + 12(1)) keys from X. Furthermore these events are disjoint. From 
the definition of the algorithm we see that for every l £ [|log£, | log t] with 
probability 0(2 t /n) there exists an interval / of size 0{2r > ) such that h(x o) is 
uniformly distributed in / and I contains no other element than h(x o). Let y be 
the maximal value of all of h(x \),..., h(x n ) which are smaller than h(x o) and 0 
if all hash values are greater than h(x o). Then we know that: 

E(h(x 0 )-y)> Y, 0 Ci) ' = 0 C^r) 

t &[I Iogt,| logt] 









We know define the hash function h! : X —> (0, 1 ) by h'(x) = ( h(x ) — z) mod 1 
where z £ (0,1) is chosen uniformly at random. Now fix the choice of h. Then 
h'(x o) is smaller than min {h'(xi),..., h'(x n )} if z £ (y,h(x 0 )). Hence for this 
fixed choice of h: 

Pr (h'(xo) < rmn{h! (x\),h'(x n )} \ h ) > h{x 0 ) -y 

Therefore 

Pr (h'(x 0 ) < min {h'(xi),h'(x n )}) > E (h(x 0 ) - y) 

= n = n (i^IL+i)) 

and h satisfies (3) I 

The lower bound for k = 2 seen in Table 1 is shown in the following theorem, 
using a probabilistic mix between distribution strategies as the main ingredient. 


Theorem 4. For any key set X = {xq,Xi, ... ,Xn] there exists a random hash 
function h : X —> [0,1) that is 2-independent such that 

Pr (h{x 0 ) < min {h(xi)} ) = fl ( ~^= ) 

\ »e{i,...,n} J \V n / 

Proof. Since we are only interested in proving the asymptotic result, and have 
no intentions of optimizing the constant we can wlog. assume that 10 y/n is an 
integer that divides n. To shorten notation we let l = lOyfii. 

We will now consider four different strategies for assigning h, and they will 
choose a hash function g : X —>■[£ + 1]. Then we let ( U x ) x& x be a family of 
independent random variables uniformly distributed in (0,1) and define h{x) = 
X z+i* ~ high-level approach is to define distribution strategies such that 
some have too high pair-collision probability, some have too low and likewise 
for the probability of hashing to the same value as x$. Then we mix over the 
strategies with probabilities such that in expectation we get the correct number 
of collisions but we maintain and increased probability of xq hashing to a smaller 
value than the rest of the keys. We will now describe the four strategies for 
choosing g. 

— Strategy Si: g[x o) is uniformly chosen. Then (g(x)) x ^ Xo is chosen uniformly 
at random such that g{x) yf g(x o) and for each y g{x o) there are exactly 
j hash values equal to y. 

— Strategy g(x o) is uniformly chosen, and y± is uniformly chosen such 
that yi y^ g(x o). For each x £ X\ {a? 0 } we define g{x) = j/i . 

— Strategy S3: g(x 0) is uniformly chosen. Then Z C A' is chosen uniformly 
at random such that \Z\ = We define g(z) = g(x 0 ) for every z £ Z. 
Then {g{x)) x ^ Xo X d.z is chosen uniformly at random under the constraint 
that g(x) yf g(X q) and for each y y^ g(x 0) there are at most j hash values 
equal to y. 
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— Strategy S 4 : y £ \i + 1] is uniformly chosen and g[x ) = y for each x £ X. 

For each of the four strategies we compute the probability that g(x 0 ) = g(x) 
and g(x) = g(x') for each x,x' £ X\ {xo}. Because of symmetry the answer is 
independent of the choice of x and x'. This is a trivial exercise and the results 
are summarized in table 3. 


Strategy 

PrSi (5(®o) = g(x)) 

Pr Si {g(x) = g{x')) 

Si 

0 

£±( < -L.) 

n — 1 l+l ) 

S 2 

0 

1 

S 3 

1 

5 y/n 

< x- 1 + ( k 1 \ 

— n — 1 ' n(n— 1) y L+l J 

Si 

1 

1 


Table 3. Strategies for choosing function h and their collision probabilities for 
x, x' £ X\{:ro}. The main idea is that there are two strategies with too low prob¬ 
ability and two with too high probability, for both types of collisions. However, we can 
mix probabilistically over the strategies to achieve the theorem. 


For event E and strategy S let Pr s(E) be the probability of E under strat¬ 
egy S. First we define the strategy T) that chooses strategy S\ with prob¬ 
ability pi and strategy S 2 with probability 1 — p\. We choose p± such that 
PrT x (g(x) = g{x')) = Then pi > 1 — j A-. Likewise we define the strategy 
X 2 that chooses strategy S 3 with probability P 2 and strategy £4 with probability 
1 — P 2 such that PrT 2 (g(x) = g(x')) = pry. Then p 2 > 1 — as well. Then: 

12 1 

Pr Tl (g(x) = g{x 0 )) = 0 < —— < - = y—^ < Pr t 2 (g(x) = g(x 0 )) 

Now we define strategy T* that chooses strategy T) with probability q and T 2 
with probability 1 — q. We choose q such that Pr-r 2 {g(x) = g{x 0 )) = yyy. Then 

q > 1--+- > Hence T* chooses strategy S± with probability > \ ^1 — = 

12 ( 1 ). 

The strategy T* implies a 2-independent g, since due to the the mix of 
strategies the pairs of keys collide with the correct probability, that is, the same 
probability as under full independence. Further, with constant probability g(X q) 

is unique. Hence with probability fi (jfpr) = ^ ^ x °) = 0 and d( x o) is 

unique. In this case h(x 0 ) is the minimum of of all h(x),x £ X which concludes 
the proof. I 

6 Quicksort 

The textbook version of the quicksort algorithm, as explained in [12], is the 
following. As input we are given a set of n numbers S = {xq, ..., x n -i} and we 
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uniformly at random choose a pivot element aWe then compare each element 
in S with x, and determine the sets Si and S 2 which consist of the elements 
that are smaller and greater than Xj respectively. Then we recursively call the 
procedure on Si and S 2 and output the sorted sequence Si followed by and 
S 2 ■ For this setting there are to the knowledge of the authors no known bounds 
under limited independence. 

We consider two different settings where our results seen in Table 2 apply. 
Setting 1. Firstly, we consider the same setting as in [11]. Let the input again 
be S = {xo,..., x„_i}. The pivot elements are pre-computed the following way: 
let random variables Yi,... ,Y n be fc-independent and each Y, is uniform over 
[n]. The *th pivot element is chosen to be xy Note that the sequence of Yi s 
is not always a permutation, hence a “cleanup” phase is necessary afterwards in 
order to ensure pivots have been performed on all elements. 

Setting 2. The second setting we consider is the following. Let Z = Z\,.... Z n 
be a sequence of ^-independent random variables that are uniform over the 
interval (0,1). Let min (j,Z) denote the index i of the j’th smallest Z t . We 
choose pivot element number j to be x m i n(j,z)- Note that the sequence Z here 
defines a permutation with high probability and so we can simply repeat the 
random experiments if any Z t collide. 

In this section we show the results of Table 2 in Setting 1. We refer to 
Appendix A.l for proofs for Setting 2 and note that the same bounds apply to 
both settings. 

Recall, that we can use the results on min-wise hashing to show upper bounds 
on the running time. The key to sharpening this analysis is to consider a problem 
related to that of min-wise hashing. In Lemma 1 we show that for two sets A, B 
satisfying \A\ < \B\ there are only 0(1) pivot elements chosen from A before 
the first element is chosen from B. We could use a min-wise type of argument 
to show that a single element a £ A is chosen as a pivot element before the first 
pivot element is chosen from B with probability at most O However, 

this would only gives us an upper bound of O (log?r) and not 0(1). 

Lemma 1. Let h : [ti] —> [n] be a 4-independent hash function and let A, B C [n] 
be disjoint sets such that |A| < \B\. Let j £ [n] be the smallest value such that 
h(j) £ B, and j = n if no such j exist. Then let C be the number of i £ [j] such 
that h(i) £ A, i.e. 

C = \{i £ [n] | h(i) £ A, h( 0), ...,h(i- 1)0 B}\ 

Then E (C) = 0(1). 

Before we prove Lemma 1 we first show how to apply it to guarantee that 
quicksort only makes O (nlogn) comparisons. 

Theorem 5. Consider quicksort in Setting 1 where we sort a set S = 
{x’o,... ,x n -i} and pivot elements are chosen using a 4-independent hash func¬ 
tion. For any i the expected number of times Xi is compared with another element 
Xj £ S\ {x^} when Xj is chosen as a pivot element is O (logra). In particular the 
expected running time is O {n log n). 
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Proof. Let tt : [n] —> [n] be a permutation of [n] such that x^o), ■ • ■ ,x^(n-\) is 
sorted ascendingly. Then 7ro h is a ^-independent function as well, and therefore 
wlog. we assume that xq, ..., x n _i is sorted ascendingly. 

Fix i £ [n] and let X = {xj+i,..., x n _i}. First we will upper bound the ex¬ 
pected number of comparisons Xj makes with elements from X when an element 
of X is chosen as pivot. We let At and Bt be the sets defined by 

At = {xj | j £ [i, i + 2 < ’~ 1 ) n [n]} B e = {xj \j£ [i + 2 t ~ 1 ,i + 2 e ) n [n]} 

For any Xj £ At, x 7 is compared with x,; only if it is chosen as a pivot element 
before any element of Bt is chosen as a pivot element. By Lemma 1 the expected 
number of times this happens is 0(1) for a fixed £ since \Bt\ > \At\. Since At 
is empty when £ > 1 + log n we see that Xi is in msexpectation only compared 
O (log n) times to the elements of X. We use an analogous argument to count 
the number of comparisons between Xi and Xq, X\, ..., x^_i and so we have that 
every element makes in expectation O(logn) comparisons. As we have n ele¬ 
ments it follows directly from linearity of expectation that the total number of 
comparisons made is in expectation 0(n\ogn). The last minor ingredient is the 
running time of the cleanup phase of Setting 1. We show in Lemma 2 that this 
uses expected time O(nlogn) for k = 2, hence the stated running time of the 
theorem follows. I 

We now show Lemma 1, which was a crucial ingredient in the above proof. 

Proof (of Lemma 1). Wlog. assume that \A\ = \B\ and let m the size of A and 
B. Let a=^. 

For each non-negative integer £ > 0 let Ct = {i £ [n] | i < 2 l \ h(i) £ A }. Let 
Et be the event that h(j) B for all j £ [n] such that j < 2^. It is now easy 
to see that if i £ C then for some integer £ < 1 + lg n, i £ Ct and Et- 1 occurs. 
Hence: 

[lg nj +1 

E(C)< £ E(|Q| .[Et-d) (4) 

e=o 

Now we note that 

E (\C t \ [Et- 1]) < E ((|Cy - a2 m ) + ) + E ( a2 e+1 ■ [Et-!]) (5) 

where x + is defined as max{x, 0}. 

First we will bound E ^(|CV| — a2 f+1 ) + ^ when a2 e - > 1. Note that for any 
r e N: 

Pr {(\C t [ ^ a2 f+1 )+ > r) = Pr (\C e \ - E(|Q|) > a 2 e + r) (6) 

E(\Ct\-E\Ct\f 
(a2 e + r) 4 

Now consider Facts 6 and 9 which we will use together with (15). 
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Fact 6 Let X = X,; where Xi,..., Xi are k-independent random variables 

in [0,1] for some even constant k > 2. Then 

E ((X - EX) fc ) = O ((EX) + (EX) fc/2 ) 

Fact 7 Let r,l £M.. It holds that 


Y— _ 

h(r + D 4 



Proof. We have 


5-' (r + l) 4 
l> l v ' 


< 



1 

(r + x ) 4 


da; = 


1 11 “ 
3 (r + a;) 3 J 0 



Note that whether each element i £ [n\,i < 2 fc is lies in Ce is only dependent on 
h(i). Hence \C(\ = JX e r„i »< 2 k[M*) & A] is the sum of 4-independent variables 
with with values in [0,1] and hence we can use Fact 6 to give an upper bound 
on (15). Combining Facts 6 and 9 and (15) we see that: 


E((|C,|-c^ +1 ) + ) 


= ]TPr((|C,|-a2 e+1 ) + >r) 

r> 1 

< V E(|C^£|Q |) 4 
— ^ (ctlf + r) 4 

r>l 




( 8 ) 


We we will bound E (a2^ +1 • [£^_i]) (the second term of (14)) in a similar 
fashion still assuming that a2 e > 1. For each i £ [n] such that i < 2 t ~ 1 let Zi = 1 
if h(i) £ B and Zi = 0 otherwise. Let Z be the sum of these 4-independent 
variables, then Ek is equivalent to Z = 0. By Fact 6 


E {[E t _ i]) = Pr (Z, = 0) < Pr (|Z - EZ\ > E Z) < 


E(Z-EZ) 4 
(E Z) 4 


= O 


( 1 


V(EZ ) 5 


Since E(Z) = a \2 e 1_ | we see that 

a2 i+1 -E([E k ]) = o(^\ (9) 

By combining (8), (9) and (14) we see that for any I such that a2 c > 1: 


E (\Ct\ [Et-x]) < O 



( 10 ) 
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( 11 ) 


Furthermore, for any £ such that cOr < 1 we trivially get: 

E(|Q|[^_ 1 ])<E(|Q|)<2^ 

To conclude we combine (4), (10) and (11) and finish the proof 


E (C) < O 


E 

e,a2 f >l 




= 0 ( 1 ) 


We now show that the cleanup phase as described by Setting 1 takes O(nlogn) 
for k = 2, which means it makes no difference to asymptotic running time of 
quicksort. 

Lemma 2. Consider quicksort in Setting 1 where we sort a set S = 
{xo,... ,x n —i} with a 2-independent hash function. The cleanup phase takes 
O (n log n) time. 

Proof. Assume wlog. that n is a power of 2. For each £ £ {0,1,..., lgn} let At 
be the set of dyadic intervals of size 2^, i.e. 

A t = {[i2 e , (i + 1)2 £ ) n [n] \ i £ [n2~ e ] } 

For any consecutive list of s elements Xi,..., Xi+ S -\ such that none of them are 
chosen as pivot elements, there exist a dyadic interval I of size h?(s) such that 
none of Xj , j £ I are chosen as pivot elements. Hence we only need to consider 
the time it takes to sort elements corresponding to dyadic intervals. Let Pt be 
an upper bound on the probability that no element from [0, 2 e ) is chosen as a 
pivot element. Then the total running time of the cleanup phase is bounded by: 


( lg ra \ / lg n 

J2\A e \P e 2 2£ \ = 0 (njr 2 f P f 

e=o J V e=o 


( 12 ) 


Fix £ and let X = Y^i=o [M*) € [0,2^)]. Then by E(X) = 2 £ , so by Markov’s 
inequality 

Pr (X = 0) < Pr ((A - E(A')) 2 > (E(A')) 2 ) 

E((I-E(I)f) M ) 

(E(A)) 2 \E(A) J ^ ’ 

Plugging this into (12) shows that the running time is bounded by O (nlogn). 


Finally we show the new 2-independent bound. The argument follows as the 4- 
independent argument, except with 2nd moment bounds instead of 4th moment 
bounds. 
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Theorem 8. Consider quicksort in Setting 1 where we sort a set S = 
{xo, ■ • • ,x n _i} and pivot elements are chosen using a 2-independent hash func¬ 
tion. For any i the expected number of times Xi is compared with another element 
Xj £ S\ {xi} when Xj is chosen as a pivot element is O (log 2 n). In particular 
the expected running time is O (n log 2 n ). 


Proof. The proof for 0(nlog 2 n) expected running time follows from an anal¬ 
ogous argument as Theorem 5. The main difference being that the analogous 
lemma to Lemma 1 yields E(C) = O(logn) instead of E(C) = 0(1), which 
implies the stated running time. This is due to the fact that as we have 2- 
independence we must use the weaker 2nd moment bounds instead of 4th mo¬ 
ment bounds as used e.g. in (7). Since the cleanup phase takes time O(nlogn) 
time even for k = 2 due to Lemma 2 the stated time holds. Otherwise the proof 
follows analogously and we omit the full argument due to repetetiveness. I 


6.1 Binary planar partitions and randomized treaps 

The result for quicksort shown in Theorem 5 has direct implications for two 
classic randomized algorithms. Both algorithms are explained in common text 
books, e.g. Motwani-Raghavan. 

A straightforward analysis of randomized algorithm] 12, Page 12] for con¬ 
struction binary planar bipartitions simply uses min-wise hashing to analyze the 
expected size of the partition. In the analysis the size of the constructed parti¬ 
tion depends on the probability of the event happening that a line segment u 
comes before a line segment v in the random permutation u ,... ,Ui, v. Using the 
the min-wise probabilities of Table 1 directly we get the same bounds on the 
partition size as running times on quicksort using the min-wise analysis. This 
analysis is tightened through Theorem 5 for both k = 2 and k = 4. 

By an analogous argument, the randomized treap data structure of [12, Page 
201] gets using the min-wise bounds expected node depth Offogn) when a treap 
is built over a size n set. Under limited independence using the min-wise analysis, 
the bounds achieved are then {0(y/n), C>(log 2 n), ©(log 2 n), O(logn)} for k = 
{2,3,4, 5} respectively. By Theorem 5 we get 0(log 2 n) for k = 2 and 0(logn) 
for k = 4. 


7 Largest bucket size 

We explore the standard case of throwing n balls into n buckets using a random 
hash function. We are interested in analyzing the bucket that has the largest 
number of balls mapped to it. Particularly, for this problem our main contribu¬ 
tion is an explicit family of hash functions that are ^-independent (remember 
Definition 1) and where the largest bucket size is ft (n 1 ^). However we start by 
stating the matching upper bound. 
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7.1 Upper bound 


We will briefly show the upper bound that matches our lower bound presented 
in the next section. We are unaware of literature that includes the upper bound, 
but note that it follows from a standard argument and is included for the sake 
of completeness. 

Lemma 3. Consider the setting where n balls are distributed among n buckets 
using a random hash function h. For m = fi ( logfog n ) anc ^ an y ^ G N such 

that k < n 1//fe then if h is k-independent the largest bucket size is 0{m) with 
probability at least 1 — 

Proof. Consider random variables B i,... ,B n , where Bi denotes the number of 
balls that are distributed to bin i. By definition, the largest bucket size is max,; Bi 
Since (max^ Bi)- < Yhi{Bi)~ for any threshold t we see that 

Pr(ma xBi > t) = Pr ^(maxBi)- > £-j < Pr > t- 

Since exactly the number of ordered fc-tuples being assigned to the 

same bucket we see that E (£?*)-) = n- ■ » . , because there are exactly n- 
ordered fc-tuples. Hence we can apply Markov’s inequality 

Pr ^ 

Since k < n x ! k implies k = O we see that k + m = 0(m). Letting 

t = k + m we get the desired upper bound J’ ( . on the probability that max^ Bi > 
m + k since (m + k)- > m k . I 


< 


E(E 

th 


n- n n 
n k t- — t- 



7.2 Lower bound 

At a high level, our hashing scheme is to divide the buckets into sets of size p and 
in each set polynomial hashing is used on the keys that do not “fill” the set. The 
crucial point is then to see that for polynomial hashing, the probability that a 
particular polynomial hashes a set of keys to the same value can be bounded by 
the probability of all coefficients of the polynomial being zero. Having a bound 
on this probability, the set size can be picked such that with constant probability 
the coefficients of one of the polynomials is zero, resulting in a large bucket. 

Proof, (of Theorem 1) Fix n, m, and k. We will give a scheme to randomly choose 
a vector x = (xq, ..., x„_i) G [n\ n such that the entries are fc-independent. 

First we choose some prime p G [\ m , h m ] ■ This is possible by Bertrand’s 
postulate. 
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Let t = and partition [n] into t + 1 disjoint sets So, Si,..., St, such that 

|Si| = p when i < t and \St\ = n — pt = (n mod p). Note that St is empty if p 
divides n. 

The scheme is the following: 


— First we pick t polynomial hash function ho, hi ,..., ht-i : [p] —> [p] of de¬ 
gree k, i.e. hi(x) = 1 + ... + a^o modp where aij £ \p\ is chosen 

uniformly at random from [p\. 

— For each x* we choose which of the events (xt £ So),..., (xt £ St) are true 

I c< . I 

such that P(xi £ Sj) = . This is done independently for each xt. 

— For each j = 0,... ,t — 1 we let Yj = {xt \ Xi £ Sj} be the set of all Xt 

contained in Sj. If \Yj\ > p we let Zj C Yj be a subset with p elements and 
Zj = Yj otherwise. We write Zj = {x(,, ..., 2 ^- 1 } and Sj = {so,..., s p _i}. 
Then we let x\ = £ [r]. The values for Yj\Zj are chosen uniformly 

in Sj and independently. 

— For all Xi such that (xj £ St) we uniformly at random and independently 
choose s £ S t such that Xj = s. 


This scheme is clearly fc-independent. The at most p elements in Yj we distribute 
using a k — 1 degree polynomial are distributed fc-independently as degree k — 1 
polynomials over p are known to be fc-independent (see e.g. [10]). The remaining 
elements are distributed fully independently. 

We can write |5)| = X)j=o \ x j e <%] an d therefore |S)| is the sum of indepen¬ 
dent variables from {0,1}. Since E (|S)|) = p = w(l) a standard Chernoff bound 
gives us that 

Pr (|Si| < (l - 0 p) < e-^ (p) = o(D ■ (13) 


For i £ [f] let Xi be 1 if Si consists of at least p/2 elements and 0 otherwise. 
In other words X/ = [|^| > p/2]. By (13) we see that E(JQ) = 1 — o(l). Let 
X = XZ[=o X,. Then E(X) = t( 1 — o(l)), so we can apply Markov’s inequality 
to obtain 


Pr ( X < -t I = Pr t — X > -t ) < 


E(i - X) 


¥ 


o(l) . 


So with probability 1 — o(l) at least half of the sets Si,i £ [t] contain at least 
p/2 elements. Assume that this happens after we for every Xi fix the choice of 
Sj such that x* £ Sj, i.e. assume X > t/2. Wlog. assume that S 0 , ■ ■ ■, Sp /2 i~i 
contain at least p/2 elements. For each j £ [|~t/2]] let Yj be 1 if hj is constant 
and 0 otherwise. That is, Yj = [ai t k -1 = ... = a^i = 0]. We note that Yj is 1 
with probability Since Yq, ... ,Y^ t /z\-i are independent we see that 

/ 1 \ r*/2i 

Pr (y 0 + ... + Y\ t /i\-i > 0) = 1 - ^1 - J 

> 1 _ = 1 - e -®("/* fc > 
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Since p < to we see that e _6, ( ra / p,v ) < e - <s, ( ri / mfe ) furthermore n/m k < 1 by 
assumption and so e - e ( n l m ) = 1 — This proves that at least one 

hj. , j £ [ ft/2”| ] is constant with probability 17 (^y)- And if that is the case at 
least on bucket has size > p/2 = Q(m). This proves the theorem under the 
assumption that X > i/2. Since X > i/2 happens with probability 1 — o(l) this 
finishes the proof. 


Since it is well known that using C7(logn/loglog?r)-independent hash func¬ 
tion to distribute the balls will imply largest bucket size 17 (log n/ log log n) , 
Corollary 1 provides the full understanding of the largest bucket size. 

Proof, (of Corollary 1) Part (a) follows directly from Theorem 1. Part (b) follows 
since k > n 1//fe implies k > log nj log log n and so we apply the 17 (log n/ log log n) 
bound from [16]. I 
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A Appendix 

A.l Quicksort in Setting 2 

The analog to Lemma 1 that we need in order to prove that quicksort in Setting 
2 using a 4-independent hash function runs in expected O ( n log n) time is proved 
below. 

Lemma 4. Let h : X —► (0,1) be a 4-independent hash function and A, B C X 
disjoint sets such that \A\ < \B\. Then 


E 


a € A I h(a) < min h(b) 

beB 


= 0 ( 1 ) 


Proof. Wlog assume that |A| = \B\ = n. Let Y be defined by 


Y = < a € A I h(a) < min h(b) 

( beB 

If a £ Y then either h(a) < ^ or there exists k £ N such that h.(a) < 2~ k+1 and 
min^gB h(b) > 2~ k , where we can choose k such that 2~ k+1 > 1 , i.e. 2 k < 2n. 

Let Y k be the set of all keys a £ A satisfying h(a) < 2~ k+l and let E k be the 
event that min^gs h(b) > 2~ k . Also let 1 E k denote the indicator variable defined 
as being 1 when event E k occurs and 0 otherwise. Since the expected number of 
keys in A hashing below ^ is 1 we see that: 

[lg nj+l 

E(|F|)<1+ £ E(\Y k \l Ek ) 

k =1 


Now note that: 

E(|n| l Ek ) < EflYfel - 2 ~ k+2 n)+ + 2~ k+2 n ■ E(l Bfc ) 

where x + is defined as max {a;, 0}. 

First we will bound E(|Y fe | — 2 ~ k+2 n) + . Note that for any f eN: 

Pr ((|Y fc | - 2 ~ k+2 n)+ > £) = Pr (\Y k \ - E(|Y fc |) > 2 ~ k+1 n + 


(14) 


0 


< 


E(\Y k \-E\Y k \) 4 


(2 




■i) A 


(15) 


Remember that we consider a 4-independent hash function h. Next we wish 
to upper bound E(|Y*.| — E(|Yfc|)) 4 (the numerator of (15)). Consider indicator 
variables X a for all a € A such that X a = 1 if a £ Y k and 0 otherwise. By the 
definition of Y k we have \Y k \ = J2aeA X o. and E (E a6 4 ^«) = C>(2 -fe+1 ). 

/ \ 4 


E(|Y fe |-E(|Y fe |)) 4 = E [J2x a -E(X a ) 


KaeA 


2 \ 2 \ 


= O (nE(A a - E(X a )) 4 + n^E((A a - E(A a ))") 
= 0(E(^A a ) 2 ) =0((2~ k nf) 


(16) 


aG A 
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Consider now the following fact, which we will use to bound a particular type of 
sum. 

Fact 9 Let r,l £ R. It holds that 


Y— _ 

^ (r + l ) 4 



Proof. We have 


E 


i 

(r + 1) 4 


< 



1 

(r + x ) 4 


da; 


1 1 


3 (r + x ) 3 


< 


1 

y* 3 ’ 


By application of Fact 9 and using our bound from (16) we can finish the upper 
bound on (15): 


1(1 Y k \ - 2 ~ k+ 2 n) + =Y Pr ((l F *l - 2 ~ k+2 n) + > *) 


£>1 


= o ( 2 -*„) 2 £—■ 


= o 


2 ~ k n 


£>1 


= O 


(2 ~ k+1 n -f I ) 4 


We only need to bound 2 -fc+2 n ■ E(l^ fc ) (the second term of (14)) in order to 
finish the proof. For each b £ B let Zb = 1 if h(b) < 2~ k and Zb = 0 otherwise. 
Then implies that = 0. Let Z = Y^beB Then by an equivalent 

argument as used for (16): 

E(ls h ) = Pr (Z = 0) < Pr (|Z - EZ\ > EZ) < = O ( 

Since E (Z) = 2~ k n we see that 


To conclude, we insert our bounds on the two terms of (14), which completes 
the proof. 


fig nj+l 

E(|F|)<1+ Y E(|Pfc| -2- fe+2 n)+ + 2- fc+2 n-E(l E J 

k =1 

|_lg nj+l 


= 1+ E ol- = o(D 


k =1 
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